STAT 679: Problem Set #3

Q1. Glacial Lakes

The data at this link contain labels of glacial lakes the Hindu Kush Himalaya, created during an ecological survey in 2015 by the International Centre for Integrated Mountain Development.

Part (a)

a.1 How many lakes are in this dataset?

lakes = read_sf("../data/GL_3basins_2015.geojson")
num_lakes = length(unique(lakes$GL_ID))

There are 3624 lakes in this data set.

a.2 What are the latitude / longitude coordinates of the largest lakes in each Sub-basin?

largest_lakes = group_by(lakes, Sub_Basin) %>% 
  filter(Area == max(Area)) %>% 
  select(GL_ID,Sub_Basin,Area,Latitude,Longitude)
GL_ID Sub_Basin Area Latitude Longitude geometry
GL086304E28374N Arun 4.0192059 28.37403 86.30475 POLYGON ((86.29541 28.35189…
GL085838E28322N Sun Koshi 5.4113912 28.32223 85.83813 POLYGON ((85.84835 28.32543…
GL086925E27898N Dudh Koshi 1.3428310 27.89853 86.92510 POLYGON ((86.9127 27.89883,…
GL087866E27869N Tamor 0.6961043 27.86953 87.86618 POLYGON ((87.85986 27.86149…
GL086447E27946N Tama Koshi 1.7197430 27.94679 86.44713 POLYGON ((86.43581 27.93609…
GL085717E28042N Indrawati 0.0269401 28.04209 85.71757 POLYGON ((85.71581 28.04205…
GL086542E27713N Likhu 0.0903048 27.71321 86.54286 POLYGON ((86.54486 27.71284…
GL082948E29196N Bheri 4.9235690 29.19634 82.94822 POLYGON ((82.94605 29.21763…
GL082423E29384N Tila 0.4322915 29.38408 82.42375 POLYGON ((82.42288 29.38101…
GL081554E29648N Karnali 0.1262245 29.64815 81.55488 POLYGON ((81.55656 29.64652…
GL082414E29753N Mugu 0.4283468 29.75376 82.41434 POLYGON ((82.41277 29.75636…
GL081526E29772N West Seti 0.5075936 29.77282 81.52618 POLYGON ((81.52834 29.77348…
GL081577E29897N Kawari 0.2839747 29.89755 81.57738 POLYGON ((81.57937 29.89932…
GL081780E30128N Humla 0.7565747 30.12892 81.78076 POLYGON ((81.77613 30.13338…
GL080178E30564N Kali 0.2396768 30.56462 80.17877 POLYGON ((80.18417 30.56559…
GL084116E28446N Seti 0.1024868 28.44681 84.11694 POLYGON ((84.11637 28.44483…
GL085519E28467N Trishuli 0.4539165 28.46783 85.51938 POLYGON ((85.51812 28.47, 8…
GL084628E28596N Budhi Gandaki 0.2674945 28.59618 84.62826 POLYGON ((84.63338 28.59795…
GL083851E28690N Marsyangdi 3.3841276 28.69089 83.85184 POLYGON ((83.84369 28.70758…
GL083701E29218N Kali Gandaki 0.4344556 29.21850 83.70156 POLYGON ((83.7026 29.21713,…

Part (b)

Plot the polygons associated with each of the lakes identified in part (a).

Hint: You may find it useful to split lakes across panels using the tm_facets function. If you use this approach, make sure to include a scale with tm_scale_bar(), so that it is clear that lakes have different sizes.

tm_shape(largest_lakes) + 
  tm_polygons(col='Area', palette="Blues",legend.show=F) + 
  tm_facets(by='Sub_Basin', free.scales = F, ncol=5, scale.factor = 4) +
  tm_scale_bar(width=0.5) + 
  tm_layout(
    main.title = "Largest lakes in each Sub Basin", main.title.position = c('center','top'), 
    panel.label.bg.color="darkgoldenrod4", panel.label.fontface='bold', panel.label.color = page_bg_color,
    bg.color = "darkseagreen", outer.bg.color = page_bg_color, main.title.fontfamily = "Candara"
    )

Part (c)

Visualize all lakes with latitude between 28.2 and 28.4 and with longitude between 85.8 and 86. Optionally, add a basemap associated with each lake.

lakes_subset = lakes %>% 
  filter((Latitude>=28.2 & Latitude<=28.4) & (Longitude>=85.8 & Longitude<=86.0))

basemap = cc_location(loc=c(85.9,28.3), buffer=1e4)

tm_shape(basemap) + 
  tm_rgb(alpha=0.9) + 
  tm_shape(lakes_subset) + 
  tm_polygons(col='deepskyblue3') + 
  tm_layout(bg.color=page_bg_color, inner.margins=c(0,0), frame=F, saturation=-1)

Q2. Australian Pharmaceuticals II

The PBS data set contains the number of orders filed every month for different classes of pharmaceutical drugs, as tracked by the Australian Pharmaceutical Benefits Scheme. The code below takes the full PBS data set and filters down to the 10 most commonly prescribed pharmaceutical types. This problem will ask you to implement and compare two approaches to visualizing this data set.

pbs_full <- read_csv("../data/PBS_random.csv") %>% 
  mutate(Month = as.Date(Month))

top_atcs <- pbs_full %>%
  group_by(ATC2_desc) %>%
  summarise(total = sum(Scripts)) %>%
  slice_max(total, n = 10) %>%
  pull(ATC2_desc)

pbs <- pbs_full %>%
  filter(ATC2_desc %in% top_atcs, Month > "2007-01-01")

Part (a)

Implement a stacked area visualization of these data.

ggplot(pbs) + 
  geom_area(aes(Month,Scripts/1e6, col=ATC2_desc, fill = ATC2_desc), alpha=0.6) +
  scale_x_date(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0), n.breaks=11) + 
  scale_fill_brewer(palette = "Paired") + 
  scale_colour_brewer(palette = "Paired") + 
  labs(
    title = "Sales of 10 most commonly prescribed pharma drugs from 2007 till date", 
    y = "Orders sold (in millions)", fill = "CLASS OF DRUG", col = "CLASS OF DRUG"
  )

Part (b)

Implement an alluvial visualization of these data.

ggplot(pbs) +
  geom_alluvium(aes(Month, Scripts/1e6, fill = ATC2_desc, col = ATC2_desc, alluvium = ATC2_desc), decreasing = FALSE, alpha = 0.6) +
  scale_x_date(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0), n.breaks=11) + 
  scale_fill_brewer(palette = "Paired") + 
  scale_colour_brewer(palette = "Paired") + 
  labs(
    title = "Sales of 10 most commonly prescribed pharma drugs from 2007 till date", 
    y = "Orders sold (in millions)", fill = "CLASS OF DRUG", col = "CLASS OF DRUG"
  )

Part (c)

Compare and contrast the strengths and weaknesses of these two visualization strategies. Which user queries are easier to answer using one approach vs. the other?